pacman::p_load(dplyr, tidyverse, DT, ggplot2, ggiraph,
ggstatsplot, patchwork, plotly, gganimate,
ggthemes, corrplot, heatmaply)Take-home Exercise 3
The focus of this exercise is to perform insightful data visualization to uncover the salient patterns of the resale prices of public housing property by residential towns and estates in Singapore.
1. Overview
This take-home exercise aims to reveal the salient patterns of the resale prices of public housing properties by residential towns and estates in Singapore.
The dataset used in take-home exercise 3 is downloaded from Data.gov.sg, and processed by RStudio tidyverse family of packages and visualized by ggplot2 and its extensions.
1.1. Loading Libraries
Using pacman package to load visualization packages.
1.2. Loading Data
Previewing the head rows of the data to observe the columns.
resale_data <- read_csv("../Data/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
options(DT.options = list(pageLength = 5))
DT::datatable(head(resale_data,20),class ="cell-border stripe",style = "bootstrap") #%>% formatStyle(names(resale_data),backgroundColor = 'grey')1.3. Data Pre-Processing
For this study, we are interested in the 3-ROOM, 4-ROOM and 5-ROOM flats that are sold during the year of 2022 (except using data of 2017-2023 for trend analysis).
unique(resale_data$flat_type)[1] "2 ROOM" "3 ROOM" "4 ROOM" "5 ROOM"
[5] "EXECUTIVE" "1 ROOM" "MULTI-GENERATION"
First, we transform the data by extracting year and month from column month, and calculating the remaining lease years and price per square meter, using the tidyverse library.
resale_345 <- resale_data %>%
# and 3/4/5 ROOM in flat_type column
filter(str_detect(flat_type, '3 ROOM|4 ROOM|5 ROOM')) %>%
# extract the year of sale
mutate(sale_year=as.integer(str_sub(month, start = 1L, end = 4L))) %>%
# extract the month of sale
mutate(sale_month=as.integer(str_sub(month, start = 6L, end = -1L))) %>%
# extract the remaining lease in years
mutate(remaining_lease_years=as.integer(str_sub(remaining_lease, start = 1L, end = 2L))) %>%
# calculate the price/sqm
mutate(psqm=round(resale_price / floor_area_sqm, 2))
DT::datatable(head(resale_345,100),class ="cell-border stripe",style = "bootstrap") %>%
DT::formatStyle(columns = colnames(.), fontSize = '20%')2. Data Analysis and Visualizations
2.1. Timely Trend (From 2017 - 2023)
2.1.1. Timely Trend of Price Per Sqm. by Flat Type
grp2 <- resale_345 %>%
group_by(sale_year, flat_type) %>%
summarise(no_sales=n(), avg_psqm=round(mean(psqm),2))
tt <- c(paste("Year:", grp2$sale_year, "<br>Price/SQM: $", grp2$avg_psqm))
fig2 <-grp2 %>%
ggplot(aes(x = sale_year, y = avg_psqm,colour = flat_type)) +
geom_smooth(alpha = 0.1,se=FALSE) +
geom_point_interactive(aes(tooltip = tt),size = 5) +
theme_excel_new()+
scale_x_continuous(breaks = seq(2017,2023,by = 1),limits = c(2017,2023)) +
scale_y_continuous(breaks = seq(0,6500,by = 1000),limits = c(4000,6500)) +
labs(title = "Avg. Resale Unit Price, 2017 - 2023", x = 'Year', y = 'Avg. Resale Price') +
scale_color_manual(values = c("#3d7dcc", "#f57ee0", "#02e302"))
girafe(
ggobj = fig2, width_svg = 12
)Consideration: lineplot is chosen for clearer observation of the change in average unit price. For more obvious illustration of the trend, the data was grouped by year instead of month. When user hover on the data points, the corresponding unit price of the specific flat type in that year will be displayed.
Analytical insights: from the visualization, we found out:
- Before 2020, the unit resale price in Singapore is fairly stable, fluctuating around SGD 4,500.
- After 2020, the average unit price increased rapidly year by year; this may be affected by the short in labour force due to COVID-19 and the increased demand.
2.1.2. Timely Trend of No. of Sales
# Group by month and flt type
grp <- resale_345 %>%
group_by(month, flat_type) %>%
summarise(no_sales=n(), avg_psqm=round(mean(psqm),2))
# Minimal point of No. of sales
m <- grp[which.min(grp$no_sales), ]
a <- list(
x = m$month,
y = m$no_sales,
text = 'Circuit Breaker Period',
xref = "x",
yref = "y",
showarrow = TRUE,
arrowhead = 7,
ax = -100,
ay = -30
)
# Use Plotly to draw line chart by flat type
fig <- grp %>%
ungroup() %>%
plot_ly(x = ~month,
y = ~no_sales,
color= ~flat_type,
text = ~paste("month:", month, "<br>Sales:", no_sales),
type = 'scatter',
mode = 'lines') %>%
layout(title = list(text = 'Total No. of Sales of 3/4/5 Room Flats, 2017 - 2022',
pad = list(b = 90, l = 130, r = 50 )
),
xaxis = list(title = 'MONTH',
zeroline = TRUE),
yaxis = list(title = 'NO. OF SALES')
)
# Add min point annotation
fig <- fig %>% add_markers()
fig <- fig %>% layout(annotations = a)
figConsideration: lineplot is chosen for clearer observation of the sales volume fluctuation. Different room types are represented by different colors of lines in the chart. As for interactability, when user hover on a particular data point, year and month and the respective sales volume will be displayed.
Analytical insights: from the visualization, we found out:
- 4-ROOM flats are the most popular flat type across the whole period, whereas the sales volume of 5-ROOM flats has been slightly higher than that of 3-ROOM flats most of the time.
- There is a noticable drop in sales volume during April & May 2020; this was probably due to the circuit breaker policy during COVID-19 spread in Singapore.
2.1.3. Timely Trend of Price Per Sqm. by Area
means=resale_345 %>% group_by(town) %>% summarise(mpsqm=mean(psqm))
means# A tibble: 26 × 2
town mpsqm
<chr> <dbl>
1 ANG MO KIO 4997.
2 BEDOK 4823.
3 BISHAN 6070.
4 BUKIT BATOK 4400.
5 BUKIT MERAH 6676.
6 BUKIT PANJANG 4491.
7 BUKIT TIMAH 6450.
8 CENTRAL AREA 7850.
9 CHOA CHU KANG 4158.
10 CLEMENTI 5885.
# … with 16 more rows
# Calculate the mean psqm for each town and year combination
means <- resale_345 %>%
group_by(town, sale_year) %>%
summarise(mpsqm = mean(psqm))
p <- ggplot(resale_345, aes(x = psqm, color = factor(town), fill = factor(town))) +
geom_density(aes(frame=resale_345$sale_year)) +
facet_grid(town~., scales = "free", space = "free") +
geom_vline(data = means, aes(xintercept = mpsqm, frame=sale_year), alpha=0.4) +
theme_bw() +
theme(
legend.position = "none",
plot.margin = margin(1, 3, 0, 0, "cm"),
panel.grid = element_blank(),
panel.background = element_blank(),
axis.text = element_blank(),
strip.text.y = element_text(angle = 0,margin = margin(2,2,2,2, "cm")),
panel.spacing = unit(0, "cm", data = NULL)
)+xlab('Price Per Sqm. (SGD)')
fig3 <- ggplotly(p) %>% layout(title="Price Per Sqm. Distribution by Town")
fig3 <- fig3 %>%
animation_opts(
1200, easing = "linear", redraw = FALSE,
) %>%
animation_slider(
currentvalue = list(prefix = "YEAR ", font = list(color="black"))
)
fig3